Goto

Collaborating Authors

 parameter independence


Learning Bayesian Networks: A Unification for Discrete and Gaussian Domains

Heckerman, David, Geiger, Dan

arXiv.org Artificial Intelligence

At last year's conference, we presented approaches for learning Bayesian networks from a combination of prior knowledge and statistical data. These approaches were presented in two papers: one addressing domains containing only discrete variables (Heckerman et al., 1994), and the other addressing domains containing continuous variables related by an unknown multivariate-Gaussian distribution (Geiger and Heckerman, 1994). Unfortunately, these presentations were substantially different, making the parallels between the two methods difficult to appreciate. In this paper, we unify the two approaches. In particular, we abstract our previous assumptions of likelihood equivalence, parameter modularity, and parameter independence such that they are appropriate for discrete and Gaussian domains (as well as other domains). Using these assumptions, we derive a domain-independent Bayesian scoring metric. We then use this general metric in combination with well-known statistical facts about the Dirichlet and normal-Wishart distributions to derive our metrics for discrete and Gaussian domains. In addition, we provide simple proofs that these assumptions are consistent for both domains.


Likelihoods and Parameter Priors for Bayesian Networks

Heckerman, David, Geiger, Dan

arXiv.org Machine Learning

We develop simple methods for constructing likelihoods and parameter priors for learning about the parameters and structure of a Bayesian network. In particular, we introduce several assumptions that permit the construction of likelihoods and parameter priors for a large number of Bayesian-network structures from a small set of assessments. The most notable assumption is that of likelihood equivalence, which says that data can not help to discriminate network structures that encode the same assertions of conditional independence. We describe the constructions that follow from these assumptions, and also present a method for directly computing the marginal likelihood of a random sample with no missing observations. Also, we show how these assumptions lead to a general framework for characterizing parameter priors of multivariate distributions.


Parameter Priors for Directed Acyclic Graphical Models and the Characterization of Several Probability Distributions

Geiger, Dan, Heckerman, David

arXiv.org Machine Learning

We develop simple methods for constructing parameter priors for model choice among Directed Acyclic Graphical (DAG) models. In particular, we introduce several assumptions that permit the construction of parameter priors for a large number of DAG models from a small set of assessments. We then present a method for directly computing the marginal likelihood of every DAG model given a random sample with no missing observations. We apply this methodology to Gaussian DAG models which consist of a recursive set of linear regression models. We show that the only parameter prior for complete Gaussian DAG models that satisfies our assumptions is the normal-Wishart distribution. Our analysis is based on the following new characterization of the Wishart distribution: let $W$ be an $n \times n$, $n \ge 3$, positive-definite symmetric matrix of random variables and $f(W)$ be a pdf of $W$. Then, f$(W)$ is a Wishart distribution if and only if $W_{11} - W_{12} W_{22}^{-1} W'_{12}$ is independent of $\{W_{12},W_{22}\}$ for every block partitioning $W_{11},W_{12}, W'_{12}, W_{22}$ of $W$. Similar characterizations of the normal and normal-Wishart distributions are provided as well.


Constraint-Based Learning for Continuous-Time Bayesian Networks

Bregoli, Alessandro, Scutari, Marco, Stella, Fabio

arXiv.org Artificial Intelligence

Dynamic Bayesian networks have been well explored in the literature as discrete-time models; however, their continuous-time extensions have seen comparatively little attention. In this paper, we propose the first constraint-based algorithm for learning the structure of continuous-time Bayesian networks. We discuss the different statistical tests and the underlying hypotheses used by our proposal to establish conditional independence. Finally, we validate its performance using synthetic data, and discuss its strengths and limitations. We find that score-based is more accurate in learning networks with binary variables, while our constraint-based approach is more accurate with variables assuming more than two values. However, more experiments are needed for confirmation.


Hierarchical Multinomial-Dirichlet model for the estimation of conditional probability tables

Azzimonti, L., Corani, G., Zaffalon, M.

arXiv.org Machine Learning

Abstract--We present a novel approach for estimating conditional probability tables, based on a joint, rather than independent, estimate of the conditional distributions belonging to the same table. We derive exact analytical expressions for the estimators and we analyse their properties both analytically and via simulation. We then apply this method to the estimation of parameters in a Bayesian network. Given the structure of the network, the proposed approach better estimates the joint distribution and significantly improves the classification performance with respect to traditional approaches. I. INTRODUCTION A Bayesian network is a probabilistic model constituted by a directed acyclic graph (DAG) and a set of conditional probability tables (CPTs), one for each node. The CPT of node X contains the conditional probability distributions of X given each possible configuration of its parents. Usually all variables are discrete and the conditional distributions are estimated adopting a Multinomial-Dirichlet model, where the Dirichlet prior is characterised by the vector of hyper-parameters α . Y et, Bayesian estimation of multinomials is sensitive to the choice of α and inappropriate values cause the estimator to perform poorly [1].


Parameter Priors for Directed Acyclic Graphical Models and the Characterization of Several Probability Distributions

Geiger, Dan, Heckerman, David

arXiv.org Machine Learning

We show that the only parameter prior for complete Gaussian DAG models that satisfies global parameter independence, complete model equivalence, and some weak regularity assumptions, is the normal-Wishart distribution. Our analysis is based on the following new characterization of the Wishart distribution: let W be an n x n, n >= 3, positive-definite symmetric matrix of random variables and f(W) be a pdf of W. Then, f(W) is a Wishart distribution if and only if W_{11}-W_{12}W_{22}^{-1}W_{12}' is independent of {W_{12}, W_{22}} for every block partitioning W_{11}, W_{12}, W_{12}', W_{22} of W. Similar characterizations of the normal and normal-Wishart distributions are provided as well. We also show how to construct a prior for every DAG model over X from the prior of a single regression model.


Learning Continuous Time Bayesian Networks

Nodelman, Uri, Shelton, Christian R., Koller, Daphne

arXiv.org Machine Learning

Continuous time Bayesian networks (CTBNs) describe structured stochastic processes with finitely many states that evolve over continuous time. A CTBN is a directed (possibly cyclic) dependency graph over a set of variables, each of which represents a finite state continuous time Markov process whose transition model is a function of its parents. We address the problem of learning parameters and structure of a CTBN from fully observed data. We define a conjugate prior for CTBNs, and show how it can be used both for Bayesian parameter estimation and as the basis of a Bayesian score for structure learning. Because acyclicity is not a constraint in CTBNs, we can show that the structure learning problem is significantly easier, both in theory and in practice, than structure learning for dynamic Bayesian networks (DBNs). Furthermore, as CTBNs can tailor the parameters and dependency structure to the different time granularities of the evolution of different variables, they can provide a better fit to continuous-time processes than DBNs with a fixed time granularity.